Kernel ridge vs. principal component regression: Minimax bounds and the qualification of regularization operators

نویسندگان

  • Lee H. Dicker
  • Dean P. Foster
  • Daniel Hsu
چکیده

Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments. MSC 2010 subject classifications: 62G08. ∗DH acknowledges support from NSF grants DMR-1534910 and IIS-1563785 and a Sloan Research Fellowship; LHD’s work was partially supported by NSF grants DMS-1208785 and DMS-1454817. 1022 Kernel ridge vs. principal component regression 1023

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kernel ridge vs. principal component regression: minimax bounds and adaptability of regularization operators

Regularization is an essential element of virtually all kernel methods for nonparametric regressionproblems. A critical factor in the effectiveness of a given kernel method is the type of regularizationthat is employed. This article compares and contrasts members from a general class of regularizationtechniques, which notably includes ridge regression and principal component reg...

متن کامل

Kernel methods and regularization techniques for nonparametric regression: Minimax optimality and adaptation

Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We f...

متن کامل

Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates

We study a decomposition-based scalable approach to kernel ridge regression, and show that it achieves minimax optimal convergence rates under relatively mild conditions. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset using a careful choice of the regularization ...

متن کامل

Early Stopping and Non-parametric Regression: An Optimal Data-dependent Stopping Rule

Early stopping is a form of regularization based on choosing when to stop running an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, a...

متن کامل

Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms

We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017